Brad Strylowski

CDS & Physics Undergraduate

Project 5

Brad Strylowski

Introduction

In this project, I examined the ability of the MDCT to successfully replicate audio. Because sound takes the form of pressure waves, audio can be well approximated by a sum of periodic trignometric functions. The coefficients for these trignometric functions are determined by finding the least-squares solution of the Modified Discrete Cosine Transform matrix and the original signal, which because the MDCT has orthogonal columns means the least-squares solution is the same as the product of the MDCT and the signal vector. In this project, we break our signal into multiple windows, and for each window approximate the signal using the MDCT, storing the result in a given number of bits. One attribute of least-squares solution is a tendency to have increased error at either end of the solution. This error was minimized by overlapping all of the windows and averaging the solution across every pair of windows.

Even and Odd Frequencies

I began by examining the MDCT's ability to reconstruct simple sine waves for a given amount of bits per window. Starting with a frequency of 2f, 4 bits per window, and a window size of 32, the reproduced signal contained a noticeable buzzing noise and a root-mean-square error of .0095. A plot of the original and reconstructed signals revealed that at every peak and trough of the sine function, the reconstructed signal would disperse into a series of jagged peaks. Repeating this with a frequency of 3f produced a more accurate representation of the original signal, barely discernable from the original audio. A plot of the two signals showed a smooth reconstructed sine function with an RMSE of only .0017, very different from the wavy and jagged reproduction of an even frequency signal. Our reconstruction of the signal is the sum of trigonometric functions, specifically cosine waves, termed basis functions. With \[\cos(\frac{(1:4096)*\pi*64*f}{4096})\] assigned as our signal, odd frequencies will produce cosine waves which will hit zero at the same point as our basis functions. This is easiest to visualize with f = 1: for a simple cosine wave which perfectly follows the first basis function, the first basis function will be a nearly-perfect approximation for the original signal, and barely any of the other basis functions will be needed. However, with even frequencies, our signal will peak when the basis functions want to go to zero, requiring a heavy use of all basis functions to approximate our signal.

Code for Part I

Implementing a Windowing Function

To fix the issue described above, we applied a windowing function, which scales the signal to zero at the ends of the window and retransforms it after reconstruction. Specifically, I used

\[ h_i = \sqrt{2} \sin(\frac{(i-\frac{1}{2})\pi}{2n}) \]

The time domain was multiplied by the window function immediately before reconstructing the signal, and the reconstructed signals w2 and w3 were retransformed by multiplying w2 by the second half of the window function and w3 by the first half of the window function. This mapping is chosen because the vector w2 is the reconstructed signal from the second half of the left window, while the vector w3 is the reconstructed signal from the first half of the right window.

Code for Part II

Dependence of RMSE on Number of Bits

Next, I constructed a chord with the frequency ratios f0, 5/4 f0, 3/2 f0, and 2 f0. I tested the dependence of the difference between the original and reconstructed signal (computed as rms error) on the number of bits per window used to encode the signal. Plotting the observed rmse against the number of bits confirmed that increasing the storage space of the signal decreased the error. The data best matches an exponential trendline, which when understood as a law of decreasing returns explains why high-fidelity audio streaming (64 bits) can be difficult to distinguish from lower fidelity audio reconstruction (32 or 16 bits).

Code for Part III, Data

Reproduction of an Audio File

I evaluated the overall effect of both the windowing function and the number of bits in reproducing an actual audio file (clip of Handel's "Hallelujah Chorus"). Ignoring the one-bit data point (the tune was not recognizable as the original), the windowing function surprisingly only improved low-bit reconstruction. For more than three bits, the reconstruction with the windowing function produced the same RMSE as the reconstruction without the windowing function.

With Windowing

With Windowing

Code for Part IV, Data

Coding and Decoding an Audio File

Finally, I separated the simplecodec program into a coder (which accepts an audio signal matrix and a number of bits) and decoder (which accepts the quantized output of the MDCT and a number of bits). To prevent the scaling of the signal by up to 0.001, I removed the lines of code which normalized the signal. After quantizing the output of the MDCT for a given window, I added the vector representing that window to my final output, which I reshaped into a 1-dimensional matrix before returning. The decoder still averages the reconstructed signal of the current window with the previous window, to avoid error near the edges of each window.

coder, decoder